Dataset statistics
| Number of variables | 15 |
|---|---|
| Number of observations | 48842 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 49 |
| Duplicate rows (%) | 0.1% |
| Total size in memory | 5.6 MiB |
| Average record size in memory | 120.0 B |
Variable types
| Numeric | 12 |
|---|---|
| Categorical | 3 |
| Dataset has 49 (0.1%) duplicate rows | Duplicates |
relationship is highly overall correlated with sex | High correlation |
sex is highly overall correlated with relationship | High correlation |
race is highly imbalanced (65.8%) | Imbalance |
workclass has 2799 (5.7%) zeros | Zeros |
education has 1389 (2.8%) zeros | Zeros |
marital-status has 6633 (13.6%) zeros | Zeros |
occupation has 2809 (5.8%) zeros | Zeros |
relationship has 19716 (40.4%) zeros | Zeros |
capital-gain has 44807 (91.7%) zeros | Zeros |
capital-loss has 46560 (95.3%) zeros | Zeros |
native-country has 857 (1.8%) zeros | Zeros |
Reproduction
| Analysis started | 2023-12-07 22:50:05.723454 |
|---|---|
| Analysis finished | 2023-12-07 22:50:22.477694 |
| Duration | 16.75 seconds |
| Software version | ydata-profiling vv4.0.0 |
| Download configuration | config.json |
age
Real number (ℝ)
| Distinct | 74 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 38.643585 |
| Minimum | 17 |
|---|---|
| Maximum | 90 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 381.7 KiB |
Quantile statistics
| Minimum | 17 |
|---|---|
| 5-th percentile | 19 |
| Q1 | 28 |
| median | 37 |
| Q3 | 48 |
| 95-th percentile | 63 |
| Maximum | 90 |
| Range | 73 |
| Interquartile range (IQR) | 20 |
Descriptive statistics
| Standard deviation | 13.71051 |
|---|---|
| Coefficient of variation (CV) | 0.35479394 |
| Kurtosis | -0.18426874 |
| Mean | 38.643585 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | 0.55758032 |
| Sum | 1887430 |
| Variance | 187.97808 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 36 | 1348 | 2.8% |
| 35 | 1337 | 2.7% |
| 33 | 1335 | 2.7% |
| 23 | 1329 | 2.7% |
| 31 | 1325 | 2.7% |
| 34 | 1303 | 2.7% |
| 28 | 1280 | 2.6% |
| 37 | 1280 | 2.6% |
| 30 | 1278 | 2.6% |
| 38 | 1264 | 2.6% |
| Other values (64) | 35763 |
| Value | Count | Frequency (%) |
| 17 | 595 | |
| 18 | 862 | |
| 19 | 1053 | |
| 20 | 1113 | |
| 21 | 1096 | |
| 22 | 1178 | |
| 23 | 1329 | |
| 24 | 1206 | |
| 25 | 1195 | |
| 26 | 1153 |
| Value | Count | Frequency (%) |
| 90 | 55 | |
| 89 | 2 | < 0.1% |
| 88 | 6 | < 0.1% |
| 87 | 3 | < 0.1% |
| 86 | 1 | < 0.1% |
| 85 | 5 | < 0.1% |
| 84 | 13 | < 0.1% |
| 83 | 11 | < 0.1% |
| 82 | 15 | < 0.1% |
| 81 | 37 |
workclass
Real number (ℝ)
| Distinct | 9 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.8704394 |
| Minimum | 0 |
|---|---|
| Maximum | 8 |
| Zeros | 2799 |
| Zeros (%) | 5.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 381.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 4 |
| median | 4 |
| Q3 | 4 |
| 95-th percentile | 6 |
| Maximum | 8 |
| Range | 8 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.4642337 |
|---|---|
| Coefficient of variation (CV) | 0.378312 |
| Kurtosis | 1.6419714 |
| Mean | 3.8704394 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -0.74790973 |
| Sum | 189040 |
| Variance | 2.1439803 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 4 | 33906 | |
| 6 | 3862 | 7.9% |
| 2 | 3136 | 6.4% |
| 0 | 2799 | 5.7% |
| 7 | 1981 | 4.1% |
| 5 | 1695 | 3.5% |
| 1 | 1432 | 2.9% |
| 8 | 21 | < 0.1% |
| 3 | 10 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 2799 | 5.7% |
| 1 | 1432 | 2.9% |
| 2 | 3136 | 6.4% |
| 3 | 10 | < 0.1% |
| 4 | 33906 | |
| 5 | 1695 | 3.5% |
| 6 | 3862 | 7.9% |
| 7 | 1981 | 4.1% |
| 8 | 21 | < 0.1% |
| Value | Count | Frequency (%) |
| 8 | 21 | < 0.1% |
| 7 | 1981 | 4.1% |
| 6 | 3862 | 7.9% |
| 5 | 1695 | 3.5% |
| 4 | 33906 | |
| 3 | 10 | < 0.1% |
| 2 | 3136 | 6.4% |
| 1 | 1432 | 2.9% |
| 0 | 2799 | 5.7% |
fnlwgt
Real number (ℝ)
| Distinct | 28523 |
|---|---|
| Distinct (%) | 58.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 189664.13 |
| Minimum | 12285 |
|---|---|
| Maximum | 1490400 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 381.7 KiB |
Quantile statistics
| Minimum | 12285 |
|---|---|
| 5-th percentile | 39615.4 |
| Q1 | 117550.5 |
| median | 178144.5 |
| Q3 | 237642 |
| 95-th percentile | 379481.65 |
| Maximum | 1490400 |
| Range | 1478115 |
| Interquartile range (IQR) | 120091.5 |
Descriptive statistics
| Standard deviation | 105604.03 |
|---|---|
| Coefficient of variation (CV) | 0.55679491 |
| Kurtosis | 6.0578482 |
| Mean | 189664.13 |
| Median Absolute Deviation (MAD) | 60295.5 |
| Skewness | 1.4388919 |
| Sum | 9.2635757 × 109 |
| Variance | 1.115221 × 1010 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 203488 | 21 | < 0.1% |
| 120277 | 19 | < 0.1% |
| 190290 | 19 | < 0.1% |
| 125892 | 18 | < 0.1% |
| 126569 | 18 | < 0.1% |
| 126675 | 17 | < 0.1% |
| 113364 | 17 | < 0.1% |
| 99185 | 17 | < 0.1% |
| 186934 | 16 | < 0.1% |
| 111567 | 16 | < 0.1% |
| Other values (28513) | 48664 |
| Value | Count | Frequency (%) |
| 12285 | 1 | < 0.1% |
| 13492 | 1 | < 0.1% |
| 13769 | 3 | |
| 13862 | 1 | < 0.1% |
| 14878 | 1 | < 0.1% |
| 18827 | 1 | < 0.1% |
| 19214 | 1 | < 0.1% |
| 19302 | 6 | |
| 19395 | 2 | < 0.1% |
| 19410 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 1490400 | 1 | |
| 1484705 | 1 | |
| 1455435 | 1 | |
| 1366120 | 1 | |
| 1268339 | 1 | |
| 1226583 | 1 | |
| 1210504 | 1 | |
| 1184622 | 1 | |
| 1161363 | 1 | |
| 1125613 | 1 |
education
Real number (ℝ)
| Distinct | 16 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.28842 |
| Minimum | 0 |
|---|---|
| Maximum | 15 |
| Zeros | 1389 |
| Zeros (%) | 2.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 381.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 9 |
| median | 11 |
| Q3 | 12 |
| 95-th percentile | 15 |
| Maximum | 15 |
| Range | 15 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 3.8744924 |
|---|---|
| Coefficient of variation (CV) | 0.37658771 |
| Kurtosis | 0.67657631 |
| Mean | 10.28842 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.93629868 |
| Sum | 502507 |
| Variance | 15.011692 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 11 | 15784 | |
| 15 | 10878 | |
| 9 | 8025 | |
| 12 | 2657 | 5.4% |
| 8 | 2061 | 4.2% |
| 1 | 1812 | 3.7% |
| 7 | 1601 | 3.3% |
| 0 | 1389 | 2.8% |
| 5 | 955 | 2.0% |
| 14 | 834 | 1.7% |
| Other values (6) | 2846 | 5.8% |
| Value | Count | Frequency (%) |
| 0 | 1389 | 2.8% |
| 1 | 1812 | 3.7% |
| 2 | 657 | 1.3% |
| 3 | 247 | 0.5% |
| 4 | 509 | 1.0% |
| 5 | 955 | 2.0% |
| 6 | 756 | 1.5% |
| 7 | 1601 | 3.3% |
| 8 | 2061 | 4.2% |
| 9 | 8025 |
| Value | Count | Frequency (%) |
| 15 | 10878 | |
| 14 | 834 | 1.7% |
| 13 | 83 | 0.2% |
| 12 | 2657 | 5.4% |
| 11 | 15784 | |
| 10 | 594 | 1.2% |
| 9 | 8025 | |
| 8 | 2061 | 4.2% |
| 7 | 1601 | 3.3% |
| 6 | 756 | 1.5% |
education-num
Real number (ℝ)
| Distinct | 16 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.078089 |
| Minimum | 1 |
|---|---|
| Maximum | 16 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 381.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 9 |
| median | 10 |
| Q3 | 12 |
| 95-th percentile | 14 |
| Maximum | 16 |
| Range | 15 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 2.5709728 |
|---|---|
| Coefficient of variation (CV) | 0.2551052 |
| Kurtosis | 0.62574527 |
| Mean | 10.078089 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | -0.31652486 |
| Sum | 492234 |
| Variance | 6.6099009 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 9 | 15784 | |
| 10 | 10878 | |
| 13 | 8025 | |
| 14 | 2657 | 5.4% |
| 11 | 2061 | 4.2% |
| 7 | 1812 | 3.7% |
| 12 | 1601 | 3.3% |
| 6 | 1389 | 2.8% |
| 4 | 955 | 2.0% |
| 15 | 834 | 1.7% |
| Other values (6) | 2846 | 5.8% |
| Value | Count | Frequency (%) |
| 1 | 83 | 0.2% |
| 2 | 247 | 0.5% |
| 3 | 509 | 1.0% |
| 4 | 955 | 2.0% |
| 5 | 756 | 1.5% |
| 6 | 1389 | 2.8% |
| 7 | 1812 | 3.7% |
| 8 | 657 | 1.3% |
| 9 | 15784 | |
| 10 | 10878 |
| Value | Count | Frequency (%) |
| 16 | 594 | 1.2% |
| 15 | 834 | 1.7% |
| 14 | 2657 | 5.4% |
| 13 | 8025 | |
| 12 | 1601 | 3.3% |
| 11 | 2061 | 4.2% |
| 10 | 10878 | |
| 9 | 15784 | |
| 8 | 657 | 1.3% |
| 7 | 1812 | 3.7% |
marital-status
Real number (ℝ)
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.6187503 |
| Minimum | 0 |
|---|---|
| Maximum | 6 |
| Zeros | 6633 |
| Zeros (%) | 13.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 381.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2 |
| median | 2 |
| Q3 | 4 |
| 95-th percentile | 5 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.5077026 |
|---|---|
| Coefficient of variation (CV) | 0.57573361 |
| Kurtosis | -0.53619399 |
| Mean | 2.6187503 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.01632824 |
| Sum | 127905 |
| Variance | 2.273167 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 22379 | |
| 4 | 16117 | |
| 0 | 6633 | 13.6% |
| 5 | 1530 | 3.1% |
| 6 | 1518 | 3.1% |
| 3 | 628 | 1.3% |
| 1 | 37 | 0.1% |
| Value | Count | Frequency (%) |
| 0 | 6633 | 13.6% |
| 1 | 37 | 0.1% |
| 2 | 22379 | |
| 3 | 628 | 1.3% |
| 4 | 16117 | |
| 5 | 1530 | 3.1% |
| 6 | 1518 | 3.1% |
| Value | Count | Frequency (%) |
| 6 | 1518 | 3.1% |
| 5 | 1530 | 3.1% |
| 4 | 16117 | |
| 3 | 628 | 1.3% |
| 2 | 22379 | |
| 1 | 37 | 0.1% |
| 0 | 6633 | 13.6% |
occupation
Real number (ℝ)
| Distinct | 15 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.5776995 |
| Minimum | 0 |
|---|---|
| Maximum | 14 |
| Zeros | 2809 |
| Zeros (%) | 5.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 381.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 3 |
| median | 7 |
| Q3 | 10 |
| 95-th percentile | 13 |
| Maximum | 14 |
| Range | 14 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 4.2305094 |
|---|---|
| Coefficient of variation (CV) | 0.64315942 |
| Kurtosis | -1.2362805 |
| Mean | 6.5776995 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 0.1105506 |
| Sum | 321268 |
| Variance | 17.89721 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 10 | 6172 | |
| 3 | 6112 | |
| 4 | 6086 | |
| 1 | 5611 | |
| 12 | 5504 | |
| 8 | 4923 | |
| 7 | 3022 | |
| 0 | 2809 | |
| 14 | 2355 | 4.8% |
| 6 | 2072 | 4.2% |
| Other values (5) | 4176 |
| Value | Count | Frequency (%) |
| 0 | 2809 | |
| 1 | 5611 | |
| 2 | 15 | < 0.1% |
| 3 | 6112 | |
| 4 | 6086 | |
| 5 | 1490 | 3.1% |
| 6 | 2072 | 4.2% |
| 7 | 3022 | |
| 8 | 4923 | |
| 9 | 242 | 0.5% |
| Value | Count | Frequency (%) |
| 14 | 2355 | 4.8% |
| 13 | 1446 | 3.0% |
| 12 | 5504 | |
| 11 | 983 | 2.0% |
| 10 | 6172 | |
| 9 | 242 | 0.5% |
| 8 | 4923 | |
| 7 | 3022 | |
| 6 | 2072 | 4.2% |
| 5 | 1490 | 3.1% |
relationship
Real number (ℝ)
HIGH CORRELATION  ZEROS 
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.4432865 |
| Minimum | 0 |
|---|---|
| Maximum | 5 |
| Zeros | 19716 |
| Zeros (%) | 40.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 381.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 3 |
| 95-th percentile | 4 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1.6021512 |
|---|---|
| Coefficient of variation (CV) | 1.1100715 |
| Kurtosis | -0.75411638 |
| Mean | 1.4432865 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.79171931 |
| Sum | 70493 |
| Variance | 2.5668885 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 19716 | |
| 1 | 12583 | |
| 3 | 7581 | 15.5% |
| 4 | 5125 | 10.5% |
| 5 | 2331 | 4.8% |
| 2 | 1506 | 3.1% |
| Value | Count | Frequency (%) |
| 0 | 19716 | |
| 1 | 12583 | |
| 2 | 1506 | 3.1% |
| 3 | 7581 | 15.5% |
| 4 | 5125 | 10.5% |
| 5 | 2331 | 4.8% |
| Value | Count | Frequency (%) |
| 5 | 2331 | 4.8% |
| 4 | 5125 | 10.5% |
| 3 | 7581 | 15.5% |
| 2 | 1506 | 3.1% |
| 1 | 12583 | |
| 0 | 19716 |
race
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 381.7 KiB |
| 4 | |
|---|---|
| 2 | |
| 1 | 1519 |
| 0 | 470 |
| 3 | 406 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 48842 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 4 |
|---|---|
| 2nd row | 4 |
| 3rd row | 4 |
| 4th row | 2 |
| 5th row | 2 |
Common Values
| Value | Count | Frequency (%) |
| 4 | 41762 | |
| 2 | 4685 | 9.6% |
| 1 | 1519 | 3.1% |
| 0 | 470 | 1.0% |
| 3 | 406 | 0.8% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 4 | 41762 | |
| 2 | 4685 | 9.6% |
| 1 | 1519 | 3.1% |
| 0 | 470 | 1.0% |
| 3 | 406 | 0.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| 4 | 41762 | |
| 2 | 4685 | 9.6% |
| 1 | 1519 | 3.1% |
| 0 | 470 | 1.0% |
| 3 | 406 | 0.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 48842 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 4 | 41762 | |
| 2 | 4685 | 9.6% |
| 1 | 1519 | 3.1% |
| 0 | 470 | 1.0% |
| 3 | 406 | 0.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 48842 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 4 | 41762 | |
| 2 | 4685 | 9.6% |
| 1 | 1519 | 3.1% |
| 0 | 470 | 1.0% |
| 3 | 406 | 0.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 48842 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 4 | 41762 | |
| 2 | 4685 | 9.6% |
| 1 | 1519 | 3.1% |
| 0 | 470 | 1.0% |
| 3 | 406 | 0.8% |
sex
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 381.7 KiB |
| 1 | |
|---|---|
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 48842 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 32650 | |
| 0 | 16192 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 32650 | |
| 0 | 16192 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 32650 | |
| 0 | 16192 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 48842 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 32650 | |
| 0 | 16192 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 48842 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 32650 | |
| 0 | 16192 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 48842 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 32650 | |
| 0 | 16192 |
capital-gain
Real number (ℝ)
| Distinct | 123 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1079.0676 |
| Minimum | 0 |
|---|---|
| Maximum | 99999 |
| Zeros | 44807 |
| Zeros (%) | 91.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 381.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 5013 |
| Maximum | 99999 |
| Range | 99999 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 7452.0191 |
|---|---|
| Coefficient of variation (CV) | 6.9059796 |
| Kurtosis | 152.6931 |
| Mean | 1079.0676 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 11.894659 |
| Sum | 52703821 |
| Variance | 55532588 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 44807 | |
| 15024 | 513 | 1.1% |
| 7688 | 410 | 0.8% |
| 7298 | 364 | 0.7% |
| 99999 | 244 | 0.5% |
| 3103 | 152 | 0.3% |
| 5178 | 146 | 0.3% |
| 5013 | 117 | 0.2% |
| 4386 | 108 | 0.2% |
| 8614 | 82 | 0.2% |
| Other values (113) | 1899 | 3.9% |
| Value | Count | Frequency (%) |
| 0 | 44807 | |
| 114 | 8 | < 0.1% |
| 401 | 5 | < 0.1% |
| 594 | 52 | 0.1% |
| 914 | 10 | < 0.1% |
| 991 | 6 | < 0.1% |
| 1055 | 37 | 0.1% |
| 1086 | 8 | < 0.1% |
| 1111 | 1 | < 0.1% |
| 1151 | 13 | < 0.1% |
| Value | Count | Frequency (%) |
| 99999 | 244 | |
| 41310 | 3 | < 0.1% |
| 34095 | 6 | < 0.1% |
| 27828 | 58 | 0.1% |
| 25236 | 14 | < 0.1% |
| 25124 | 6 | < 0.1% |
| 22040 | 1 | < 0.1% |
| 20051 | 49 | 0.1% |
| 18481 | 2 | < 0.1% |
| 15831 | 8 | < 0.1% |
capital-loss
Real number (ℝ)
| Distinct | 99 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 87.502314 |
| Minimum | 0 |
|---|---|
| Maximum | 4356 |
| Zeros | 46560 |
| Zeros (%) | 95.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 381.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 4356 |
| Range | 4356 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 403.00455 |
|---|---|
| Coefficient of variation (CV) | 4.6056445 |
| Kurtosis | 20.014346 |
| Mean | 87.502314 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.5698089 |
| Sum | 4273788 |
| Variance | 162412.67 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 46560 | |
| 1902 | 304 | 0.6% |
| 1977 | 253 | 0.5% |
| 1887 | 233 | 0.5% |
| 2415 | 72 | 0.1% |
| 1485 | 71 | 0.1% |
| 1848 | 67 | 0.1% |
| 1590 | 62 | 0.1% |
| 1602 | 62 | 0.1% |
| 1876 | 59 | 0.1% |
| Other values (89) | 1099 | 2.3% |
| Value | Count | Frequency (%) |
| 0 | 46560 | |
| 155 | 1 | < 0.1% |
| 213 | 5 | < 0.1% |
| 323 | 5 | < 0.1% |
| 419 | 3 | < 0.1% |
| 625 | 17 | < 0.1% |
| 653 | 4 | < 0.1% |
| 810 | 2 | < 0.1% |
| 880 | 6 | < 0.1% |
| 974 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 4356 | 3 | < 0.1% |
| 3900 | 2 | < 0.1% |
| 3770 | 4 | < 0.1% |
| 3683 | 2 | < 0.1% |
| 3175 | 2 | < 0.1% |
| 3004 | 5 | < 0.1% |
| 2824 | 14 | |
| 2754 | 2 | < 0.1% |
| 2603 | 7 | |
| 2559 | 17 |
hours-per-week
Real number (ℝ)
| Distinct | 96 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 40.422382 |
| Minimum | 1 |
|---|---|
| Maximum | 99 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 381.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 17.05 |
| Q1 | 40 |
| median | 40 |
| Q3 | 45 |
| 95-th percentile | 60 |
| Maximum | 99 |
| Range | 98 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 12.391444 |
|---|---|
| Coefficient of variation (CV) | 0.30654908 |
| Kurtosis | 2.9510591 |
| Mean | 40.422382 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 0.23874966 |
| Sum | 1974310 |
| Variance | 153.54789 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 40 | 22803 | |
| 50 | 4246 | 8.7% |
| 45 | 2717 | 5.6% |
| 60 | 2177 | 4.5% |
| 35 | 1937 | 4.0% |
| 20 | 1862 | 3.8% |
| 30 | 1700 | 3.5% |
| 55 | 1051 | 2.2% |
| 25 | 958 | 2.0% |
| 48 | 770 | 1.6% |
| Other values (86) | 8621 | 17.7% |
| Value | Count | Frequency (%) |
| 1 | 27 | 0.1% |
| 2 | 53 | 0.1% |
| 3 | 59 | 0.1% |
| 4 | 84 | 0.2% |
| 5 | 95 | 0.2% |
| 6 | 92 | 0.2% |
| 7 | 45 | 0.1% |
| 8 | 218 | |
| 9 | 27 | 0.1% |
| 10 | 425 |
| Value | Count | Frequency (%) |
| 99 | 137 | |
| 98 | 14 | < 0.1% |
| 97 | 2 | < 0.1% |
| 96 | 9 | < 0.1% |
| 95 | 2 | < 0.1% |
| 94 | 1 | < 0.1% |
| 92 | 3 | < 0.1% |
| 91 | 3 | < 0.1% |
| 90 | 42 | 0.1% |
| 89 | 3 | < 0.1% |
native-country
Real number (ℝ)
| Distinct | 42 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 36.749355 |
| Minimum | 0 |
|---|---|
| Maximum | 41 |
| Zeros | 857 |
| Zeros (%) | 1.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 381.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 19 |
| Q1 | 39 |
| median | 39 |
| Q3 | 39 |
| 95-th percentile | 39 |
| Maximum | 41 |
| Range | 41 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 7.7753432 |
|---|---|
| Coefficient of variation (CV) | 0.21157768 |
| Kurtosis | 12.772293 |
| Mean | 36.749355 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -3.6895285 |
| Sum | 1794912 |
| Variance | 60.455961 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 39 | 43832 | |
| 26 | 951 | 1.9% |
| 0 | 857 | 1.8% |
| 30 | 295 | 0.6% |
| 11 | 206 | 0.4% |
| 33 | 184 | 0.4% |
| 2 | 182 | 0.4% |
| 8 | 155 | 0.3% |
| 19 | 151 | 0.3% |
| 5 | 138 | 0.3% |
| Other values (32) | 1891 | 3.9% |
| Value | Count | Frequency (%) |
| 0 | 857 | |
| 1 | 28 | 0.1% |
| 2 | 182 | 0.4% |
| 3 | 122 | 0.2% |
| 4 | 85 | 0.2% |
| 5 | 138 | 0.3% |
| 6 | 103 | 0.2% |
| 7 | 45 | 0.1% |
| 8 | 155 | 0.3% |
| 9 | 127 | 0.3% |
| Value | Count | Frequency (%) |
| 41 | 23 | < 0.1% |
| 40 | 86 | 0.2% |
| 39 | 43832 | |
| 38 | 27 | 0.1% |
| 37 | 30 | 0.1% |
| 36 | 65 | 0.1% |
| 35 | 115 | 0.2% |
| 34 | 21 | < 0.1% |
| 33 | 184 | 0.4% |
| 32 | 67 | 0.1% |
target
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 381.7 KiB |
| 1 | |
|---|---|
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 48842 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 37155 | |
| 0 | 11687 | 23.9% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 37155 | |
| 0 | 11687 | 23.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 37155 | |
| 0 | 11687 | 23.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 48842 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 37155 | |
| 0 | 11687 | 23.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 48842 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 37155 | |
| 0 | 11687 | 23.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 48842 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 37155 | |
| 0 | 11687 | 23.9% |
| age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | capital-gain | capital-loss | hours-per-week | native-country | race | sex | target | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| age | 1.000 | 0.069 | -0.078 | -0.032 | 0.063 | -0.373 | 0.002 | -0.322 | 0.124 | 0.058 | 0.147 | 0.005 | 0.027 | 0.125 | 0.316 |
| workclass | 0.069 | 1.000 | -0.029 | 0.002 | 0.044 | -0.074 | 0.211 | -0.117 | 0.030 | 0.013 | 0.135 | -0.007 | 0.057 | 0.151 | 0.181 |
| fnlwgt | -0.078 | -0.029 | 1.000 | -0.014 | -0.030 | 0.036 | 0.001 | 0.014 | -0.009 | -0.001 | -0.016 | -0.075 | 0.070 | 0.028 | 0.010 |
| education | -0.032 | 0.002 | -0.014 | 1.000 | 0.213 | -0.010 | -0.034 | 0.019 | 0.007 | 0.008 | 0.011 | 0.082 | 0.057 | 0.066 | 0.252 |
| education-num | 0.063 | 0.044 | -0.030 | 0.213 | 1.000 | -0.065 | 0.116 | -0.093 | 0.119 | 0.077 | 0.164 | 0.052 | 0.067 | 0.073 | 0.360 |
| marital-status | -0.373 | -0.074 | 0.036 | -0.010 | -0.065 | 1.000 | -0.019 | 0.318 | -0.075 | -0.043 | -0.207 | -0.029 | 0.082 | 0.459 | 0.448 |
| occupation | 0.002 | 0.211 | 0.001 | -0.034 | 0.116 | -0.019 | 1.000 | -0.076 | 0.021 | 0.019 | 0.088 | -0.005 | 0.071 | 0.374 | 0.312 |
| relationship | -0.322 | -0.117 | 0.014 | 0.019 | -0.093 | 0.318 | -0.076 | 1.000 | -0.101 | -0.064 | -0.303 | -0.010 | 0.097 | 0.646 | 0.454 |
| capital-gain | 0.124 | 0.030 | -0.009 | 0.007 | 0.119 | -0.075 | 0.021 | -0.101 | 1.000 | -0.066 | 0.092 | 0.017 | 0.013 | 0.049 | 0.271 |
| capital-loss | 0.058 | 0.013 | -0.001 | 0.008 | 0.077 | -0.043 | 0.019 | -0.064 | -0.066 | 1.000 | 0.060 | 0.009 | 0.012 | 0.064 | 0.197 |
| hours-per-week | 0.147 | 0.135 | -0.016 | 0.011 | 0.164 | -0.207 | 0.088 | -0.303 | 0.092 | 0.060 | 1.000 | 0.012 | 0.058 | 0.240 | 0.269 |
| native-country | 0.005 | -0.007 | -0.075 | 0.082 | 0.052 | -0.029 | -0.005 | -0.010 | 0.017 | 0.009 | 0.012 | 1.000 | 0.267 | 0.039 | 0.081 |
| race | 0.027 | 0.057 | 0.070 | 0.057 | 0.067 | 0.082 | 0.071 | 0.097 | 0.013 | 0.012 | 0.058 | 0.267 | 1.000 | 0.114 | 0.099 |
| sex | 0.125 | 0.151 | 0.028 | 0.066 | 0.073 | 0.459 | 0.374 | 0.646 | 0.049 | 0.064 | 0.240 | 0.039 | 0.114 | 1.000 | 0.215 |
| target | 0.316 | 0.181 | 0.010 | 0.252 | 0.360 | 0.448 | 0.312 | 0.454 | 0.271 | 0.197 | 0.269 | 0.081 | 0.099 | 0.215 | 1.000 |
| age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capital-gain | capital-loss | hours-per-week | native-country | target | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 39.0 | 7 | 77516.0 | 9 | 13.0 | 4 | 1 | 1 | 4 | 1 | 2174.0 | 0.0 | 40.0 | 39 | 1 |
| 1 | 50.0 | 6 | 83311.0 | 9 | 13.0 | 2 | 4 | 0 | 4 | 1 | 0.0 | 0.0 | 13.0 | 39 | 1 |
| 2 | 38.0 | 4 | 215646.0 | 11 | 9.0 | 0 | 6 | 1 | 4 | 1 | 0.0 | 0.0 | 40.0 | 39 | 1 |
| 3 | 53.0 | 4 | 234721.0 | 1 | 7.0 | 2 | 6 | 0 | 2 | 1 | 0.0 | 0.0 | 40.0 | 39 | 1 |
| 4 | 28.0 | 4 | 338409.0 | 9 | 13.0 | 2 | 10 | 5 | 2 | 0 | 0.0 | 0.0 | 40.0 | 5 | 1 |
| 5 | 37.0 | 4 | 284582.0 | 12 | 14.0 | 2 | 4 | 5 | 4 | 0 | 0.0 | 0.0 | 40.0 | 39 | 1 |
| 6 | 49.0 | 4 | 160187.0 | 6 | 5.0 | 3 | 8 | 1 | 2 | 0 | 0.0 | 0.0 | 16.0 | 23 | 1 |
| 7 | 52.0 | 6 | 209642.0 | 11 | 9.0 | 2 | 4 | 0 | 4 | 1 | 0.0 | 0.0 | 45.0 | 39 | 0 |
| 8 | 31.0 | 4 | 45781.0 | 12 | 14.0 | 4 | 10 | 1 | 4 | 0 | 14084.0 | 0.0 | 50.0 | 39 | 0 |
| 9 | 42.0 | 4 | 159449.0 | 9 | 13.0 | 2 | 4 | 0 | 4 | 1 | 5178.0 | 0.0 | 40.0 | 39 | 0 |
| age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capital-gain | capital-loss | hours-per-week | native-country | target | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 48832 | 61.0 | 4 | 89686.0 | 11 | 9.0 | 2 | 12 | 0 | 4 | 1 | 0.0 | 0.0 | 48.0 | 39 | 1 |
| 48833 | 31.0 | 4 | 440129.0 | 11 | 9.0 | 2 | 3 | 0 | 4 | 1 | 0.0 | 0.0 | 40.0 | 39 | 1 |
| 48834 | 25.0 | 4 | 350977.0 | 11 | 9.0 | 4 | 8 | 3 | 4 | 0 | 0.0 | 0.0 | 40.0 | 39 | 1 |
| 48835 | 48.0 | 2 | 349230.0 | 12 | 14.0 | 0 | 8 | 1 | 4 | 1 | 0.0 | 0.0 | 40.0 | 39 | 1 |
| 48836 | 33.0 | 4 | 245211.0 | 9 | 13.0 | 4 | 10 | 3 | 4 | 1 | 0.0 | 0.0 | 40.0 | 39 | 1 |
| 48837 | 39.0 | 4 | 215419.0 | 9 | 13.0 | 0 | 10 | 1 | 4 | 0 | 0.0 | 0.0 | 36.0 | 39 | 1 |
| 48838 | 64.0 | 0 | 321403.0 | 11 | 9.0 | 6 | 0 | 2 | 2 | 1 | 0.0 | 0.0 | 40.0 | 39 | 1 |
| 48839 | 38.0 | 4 | 374983.0 | 9 | 13.0 | 2 | 10 | 0 | 4 | 1 | 0.0 | 0.0 | 50.0 | 39 | 1 |
| 48840 | 44.0 | 4 | 83891.0 | 9 | 13.0 | 0 | 1 | 3 | 1 | 1 | 5455.0 | 0.0 | 40.0 | 39 | 1 |
| 48841 | 35.0 | 5 | 182148.0 | 9 | 13.0 | 2 | 4 | 0 | 4 | 1 | 0.0 | 0.0 | 60.0 | 39 | 0 |
Most frequently occurring
| age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capital-gain | capital-loss | hours-per-week | native-country | target | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 12 | 21.0 | 4 | 243368.0 | 13 | 1.0 | 4 | 5 | 1 | 4 | 1 | 0.0 | 0.0 | 50.0 | 26 | 1 | 3 |
| 23 | 25.0 | 4 | 195994.0 | 3 | 2.0 | 4 | 9 | 1 | 4 | 0 | 0.0 | 0.0 | 40.0 | 13 | 1 | 3 |
| 24 | 25.0 | 4 | 308144.0 | 9 | 13.0 | 4 | 3 | 1 | 4 | 1 | 0.0 | 0.0 | 40.0 | 26 | 1 | 3 |
| 0 | 17.0 | 4 | 153021.0 | 2 | 8.0 | 4 | 12 | 3 | 4 | 0 | 0.0 | 0.0 | 20.0 | 39 | 1 | 2 |
| 1 | 18.0 | 5 | 378036.0 | 2 | 8.0 | 4 | 5 | 3 | 4 | 1 | 0.0 | 0.0 | 10.0 | 39 | 1 | 2 |
| 2 | 19.0 | 0 | 167428.0 | 15 | 10.0 | 4 | 0 | 3 | 4 | 1 | 0.0 | 0.0 | 40.0 | 39 | 1 | 2 |
| 3 | 19.0 | 4 | 97261.0 | 11 | 9.0 | 4 | 5 | 1 | 4 | 1 | 0.0 | 0.0 | 40.0 | 39 | 1 | 2 |
| 4 | 19.0 | 4 | 130431.0 | 4 | 3.0 | 4 | 5 | 1 | 4 | 1 | 0.0 | 0.0 | 36.0 | 26 | 1 | 2 |
| 5 | 19.0 | 4 | 138153.0 | 15 | 10.0 | 4 | 1 | 3 | 4 | 0 | 0.0 | 0.0 | 10.0 | 39 | 1 | 2 |
| 6 | 19.0 | 4 | 139466.0 | 15 | 10.0 | 4 | 12 | 3 | 4 | 0 | 0.0 | 0.0 | 25.0 | 39 | 1 | 2 |